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Abstract 

This paper reports on the analysis of state statutes and department of education 
regulations in fifty states for changes in teacher evaluation in use since the passage 
of No Child Left Behind Act of 2001. We asked what the policy activity for teacher 
evaluation is in state statutes and department of education regulations, how these 
changes in statutes and regulations might affect the practice of teacher evaluation, 
and what were the implications for instructional supervision from these policy 
actions. Teacher evaluation statutes and department of education regulations 
provided the data for this study, using archival records from each state's legislature 
and education departments that were placed into a comparison matrix based on 
criteria developed from the National Governors Association (NGA) goals for 
school reform (Goldrick, 2002). Data were analyzed deductively in terms of these 
criteria for underlying theories of action (Malen, 2005), trends, and likely effects on 


1 A preliminary version of this paper was presented at the Annual meeting of the American Educational 
Research Association (AERA), April, 2006, San Francisco, and received the 2006 AERA Distinguished Paper Award of 
the Supervision and Instructional Leadership Special Interest Group. 
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teacher evaluation and implications for supervision. The majority of states adopted 
many of the NGA strategies, asserted oversight and involvement in local teacher 
evaluation practices, decreased the frequency of veteran teacher evaluation, and 
increased the types of data used in evaluation. Whether or not the changes in 
teacher evaluation will improve student learning in the long run remains to be seen. 
Keywords: teacher evaluation; educational policy; elementary secondary education; 
supervision. 

Evaluacion del profesorado como un objetivo politico para mejorar el 
aprendizaje de los estudiantes: Un Examen de los Estatutos y Acciones 
Reguladoras de Cincuenta Estados desde NCLB 

Resumen 

Este documento presenta un analisis de las leyes estatales y las regulaciones del 
Departamento de Educacion en cincuenta estados sobre los cambios en la 
evaluacion del profesorado desde la implementacion de la ley No Child Left 
Behind. Las preguntas de este trabajo fueron: ^Cual es la actividad politica para 
evaluar al profesorado en las leyes estatales y las regulaciones del Departamento de 
Educacion? ^De que forma los cambios en los estatutos y regulaciones afectaron 
las practicas de evaluacion de los maestros/as?, y ^Cuales fueron las consecuencias 
de estas acciones politicas en los procesos de supervision de la instruccion? Los 
datos de este estudio provienen de los estatutos de evaluacion del profesorado y los 
reglamentos del Departamento de Educacion. Usando registros de los archivos del 
poder legislative y el departamento de educacion de cada estado, los datos fueron 
colocados en una matriz de comparacion basada en los criterios desarrollados para 
evaluar los objetivos de reformas escolares de la Asociacion Nacional de 
Gobernadores (NGA) (Goldrick, 2002). En base a esos criterios los datos fueron 
analizados deductivamente para verificar si existia un teoria de accion implicita 
(Malen, 2005), las tendencias y efectos en la evaluacion del profesorado y sus 
consecuencias para la supervision. La mayoria de los Estados adoptaron muchas de 
las estrategias de la NGA, enfatizando la supervision y la participacion en practicas 
locales de evaluacion docente, la disminucion de la frecuencia de las evaluaciones 
del profesorado con experiencia, y el aumento en los tipos de datos utilizados en 
las evaluaciones. En que medida esos cambios en la evaluacion del profesorado 
mejoraran el aprendizaje de los estudiantes en el largo plazo, aun esta por verse. 
Palabras claves: evaluacion del profesorado; politica educativa; escuela primaria, 
secundaria; supervision. 


Introduction 


Throughout its complex history, supervision has long held the promise to improve teachers 
and their classroom instruction (Hazi & Arredondo Rucinski, 2005, 2006). This is largely due to the 
fact that supervision is usually understood as teacher evaluation in the schools (see Holland, 2005, 
among others). With the adoption and implementation of No Child Left Behind (U.S. Department 
of Education, 2002) and the resultant call by the National Governors Association (NGA) to target 
teacher evaluation policy as a way to achieve the goal of a highly qualified teacher in every 
classroom, policy makers focused efforts on this promise to improve student learning (Goldrick, 
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2002). The NGA identified six policy goals for improving student learning: define teacher quality, 
focus evaluation policy on improving teaching practices, incorporate student learning into teacher 
evaluation, create professional accountability through developing career ladders, train evaluators in 
pre-service programs, and broaden participation in evaluation designs (Goldrick, 2002). While the 
national scene has shifted away from a direct focus on teacher evaluation in recent months, it seems 
likely that because of the renewed interest in pay for performance plans, evaluation will soon be a 
policy target once more. 

Purpose and Research Questions 

The purpose of this research is to determine the extent to which the identified NGA goals 
appear in individual state statutes and regulations, and to consider the likely effects on teacher 
evaluation and the implications for instructional supervision. Thus, we focus on three research 
questions: First, what is the policy activity for teacher evaluation in state statutes and department of 
education regulations? Second, how might these changes in statutes and regulations affect the 
practice of teacher evaluation? Third, what implications for instructional supervision are likely to 
result from these policy actions? 

Background 

Teacher evaluation statute and policy has long been the topic of research (e.g., Furtwengler, 
1995B; Wise, Darling-Flammond, McLaughlin & Burnstein, 1984; Wuhs & Manatt, 1983; Zirkel, 
1979-90). Prior to the 1980s, teacher evaluation was left to local discretion (Veir & Dagley, 2002; 
Zirkel, 1979-80). Since the 1980s, however, policy activity has tended to ebb and flow with various 
national initiatives. For example, in response to A Nation at Risk (The National Commission on 
Excellence in Education, 1983), some states targeted teacher evaluation to upgrade teacher quality 
(Flazi & Garman, 1988). Also, Furtwengler (1995) found that states enacted their first requirements 
for teacher evaluation; specified criteria, procedures, tenure and instruments; attempted performance 
evaluation systems; and offered training in evaluation. Furthermore, states in the southeast were 
more active and detailed in their revisions, while those in the northeast had the least regulation of 
teacher evaluation. 

As a result of No Child Left Behind’s demand for highly qualified teachers in every 
classroom, teacher evaluation became a policy target in the states. The National Governors 
Association targeted evaluation as “a tool for instructional improvement” (Goldrick, 2002, p. 3). 
Since the National Governors Association is one of the organizations most influential over 
educational policy in the United States (Swanson & Bariage, 2006), it is important to see how this 
organization has influenced teacher evaluation policy in the states during this era of accountability, 
especially since its practice has been historically a matter of local judgment and discretion. Initially 
we wondered whether some states would be more prescriptive than others in their approach to 
teacher evaluation, and whether there was a trend to embed recommended practices from 
supervision and professional development into state statute and department of education 
regulations. 

Also, while some scholars hold the opinion that traditional forms of evaluating teachers (i.e., 
supervisory observation) have “served the profession of teaching well for decades,” are “by and 
large unproblematic,” and are not the “hot button policy issues in current political debates” (Glass, 
2004, p.l.), we believe that evaluation is flawed, contested, and problematic. We believe that existing 
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evaluation statutes and regulations will be changed to try to make teachers more accountable 
through this highly ritualistic procedure, and in so doing, will further complicate a flawed practice. In 
addition, there has been much confusion about supervision and evaluation. Researchers tend to view 
them as separate processes, while practitioners believe them to be synonymous (e.g., Holland, 2005). 
We attempt to differentiate supervision and evaluation by purpose, i.e., supervision as the helping or 
teacher professional development function, and evaluation as the personnel function. In fact, much 
effort was expended during the 1960s and 1970s to refocus evaluation so that it was more 
democratic and to change the perception of supervision from that of an “evaluative function” to 
that of a “helping function” through clinical supervision models. 

This tension between the dual purposes of evaluation and supervision is not new. According 
to Glanz (1998), it has been evident in the supervision literature since the work of Hosic (1920). 
Commenting on the intractability of these purposes, Glanz (p. 64) cited Tanner and Tanner as 
having argued (in 1987) that the conflicting duality in purpose has presented an “almost 
insurmountable dilemma for educators” and “is probably the most serious and, up until now, 
unresolved problem in the field of supervision.” Conflicting perspectives about teacher learning may 
be even older than Hosic's work. In the early 1900s John Dewey developed a theory of instrumental 
education that advanced the notion that engagement in real world problem solving with intelligent 
thought and action constituted learning. According to Dewey (1910) authentic learning only occurs 
when human beings focus their attentions, energies, and abilities on solving dilemmas and 
complexities while reflecting on their experiences. This view of learning appears especially relevant 
to both supervisors and teachers, and with Dewey and others advocating authentic learning 
experiences for students, it seems likely that some supervisors focused time and energy on helping 
teachers think about their teaching in ways that stimulated their own learning. And, as supervision 
and evaluation processes became more democratic, emphasis on the goal of professional 
development of teachers for the purpose of improving classroom instruction became increasingly 
prevalent. 

While some clinical supervision models both implied and intended reflection as professional 
development, a direct focus on using teacher reflection as a strategy for improving teaching did not 
appear as a significant part of the supervision literature until the 1970s. According to Pajak (1993), 
developmental and reflective supervision models first began to appear following the publication of 
Schon’s (1983) book on the reflective practitioner. Garman (1982) was among the first to write 
about reflective practice for in-service supervision, while Zeichner and Liston (1987), Grimmett and 
Erickson (1988) and others made reflection popular in pre-service teacher education. 

As noted by Glanz (1998) concomitant conflicting trends were evident in the field. During 
this same time period, much of the effective teaching literature moved toward more technical or 
didactic models of teaching (see e.g., Acheson & Gall, 1980; Hunter & Russell, 1977; Joyce & 
Showers, 1982). Similarly, an emphasis on principals as instructional leaders became more prevalent, 
along with the increasingly technical/ didactic models of supervision (Acheson & Gall, 1980; Hunter, 
1986; Pajak, 1993). This was especially true on the West Coast of the United States as “effective 
teaching models” and “effective learning environments” were described in considerable detail 
(Bransford & Vye, 1989; Brophy & Good, 1984; Marzano, Pickering, Arredondo, Blackburn, 

Brandt, & Moffett, 1992; 1996), developmental models of supervision (Glickman, Gordon, & Ross- 
Gordon, 1985; Sergiovanni & Starratt, 1993), model(s) for cognitive coaching (Costa & Garmston, 
1994), and reflection models for mentoring teacher development (Arredondo & Rucinski, 1998; 
Reiman & Thies-Sprinthall, 1998) among others, were promulgated. Along with these models, 
arguments about supervision’s dual and conflicting purposes — the helping vs. evaluating purposes — 
continued and further developed. The ongoing tension is currently reflected in views of supervision 
as instructional leadership. As states have moved to adopt the National Governors Association 
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strategies for defining teaching quality, and adding practices that encourage professional 
development, the implied theory of action is that increasing professional teacher behaviors through 
development activities and embedding these into state statute and policy regulations will lead to 
improved student learning. 


Methods 

Teacher evaluation statutes and department of education regulations provided the data for 
this study. These data were accessed through the websites of each state's legislature and education 
departments and collected in three phases. Both statute and regulation were reviewed in each state 
since these usually work in tandem. While statute typically provides a minimalist’s perspective on 
such items as evaluation procedure, due process, and grounds for dismissal, it is state regulation that 
provides the details of its practice. Some practices, however, may have found their way into statute 
and, thus, become institutionalized (e.g., aspects of clinical supervision in the 1970s such as the pre- 
conference). In this research the statute was viewed as the foundation and the state’s regulation was 
considered its details. In the analysis, both were examined to reveal the extent to which procedures 
and practices have become embedded within statute, and thus, less likely to be amended. Manuals 
and other documents on a state’s website were also included when available. 

Various sources were used to construct a comparison matrix to collect and analyze the state 
statutes and policies. The six NGA policy strategies (Goldrick, 2002) were used to first create criteria 
for the matrix. In addition to these, the work of such scholars as: Furtwengler, (1995), Peterson 
(2004), Pipho (1991), Rossow and Tate (2003), Wise et al. (1984), and Zirkel (1996) were also 
consulted. As new criteria emerged to account for novel or unanticipated changes in state statute 
and policy, categories reflecting these criteria were added to the matrix. 

Once evaluation statutes and regulations were collected and analyzed, it seemed helpful to 
categorize the levels of state control over teacher evaluation practices, especially since some states 
went to great lengths to achieve oversight, while others left much to local discretion. We therefore 
developed a four-level state control rating. In Level 1 , the least prescriptive, the state department of 
education delegates choice and control of the evaluation policy, criteria, and the instrument to the 
local school district, thus Level 1 is local discretion. In Level 2, the state allows the evaluation policy, 
criteria, and instmment to be determined locally, but must approve, monitor and/ or inspect it. Here, 
the state has remote control. In Level 3, definitional control, the state is more involved locally by 
specifying the criteria by which teachers are to be evaluated. In Level 4, procedural control, the state 
is most involved with local practices by specifying the instmment and/ or procedures by which 
teachers are to be evaluated. 

In the first phase of the study, 20 states were selected based on two criteria: whether it was 
centralized or decentralized in educational policy making based on Pipho's (1991) classification and 
on whether or not recent evaluation policy activity was reported by the Education Commission of 
the States (Hazi & Arredondo Rucinski, 2005). In phase two, data were collected for ten randomly 
selected states and added to the study (Hazi & Arredondo Rucinski, 2006). In phase three, data were 
collected from the remaining twenty states and merged with the total data set. The matrix was 
revised to display the data, and data were then analyzed through a process of deductive analysis. In 
deductive analysis of qualitative data, informal hypotheses are formulated and data analyzed to allow 
researchers to either confirm or reject the specific hypothesis statements. As hypotheses are rejected, 
new ones are formulated and additional data are collected as needed. This process continued 
throughout the data collection and analysis phases. Malen (2005) has categorized this type of policy 
analysis as a "theory of action" strategy in which broad policy initiatives can be examined and 
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assessed based on the underlying "theories of action, or sets of principles and propositions, 
orientations, and related assumptions" that underpin the policy (p. 196) and are either stated 
explicitly or can be inferred from written descriptions of the policy. 

Policy Activity for Teacher Evaluation in State Statutes and Regulations 

Results show that the states engaged in four general types of activity: adopting NGA 
strategies, asserting more oversight and involvement in local evaluation practices, decreasing the 
frequency of veteran teacher evaluation, and increasing the data used in evaluation. Each of these is 
described separately. Table 1 presents selected dimensions of our data that may affect a state’s move 
to change its evaluation statute and regulations: whether it is centralized or decentralized (Pipho, 
1991), whether or not it has collective bargaining (Education Commission of the States, 2002), its 
level of state control (Hazi & Arredondo Rucinski, 2005), and the frequency of evaluation for 
veteran or tenured teachers, whose performance has been judged satisfactory. As shown in Table 1, 
most state departments of education have involvement or control over evaluation policy and local 
practices at Level of State Control 2, 3, or 4. Thus, state departments of education in a combined 29 
states (58%) approve, monitor, or inspect local evaluation policy (1 1 or 22% at Level of State 
Control 2), specify evaluation criteria (12 or 24% at Level of State Control 3), or require a specific 
instrument or procedure to evaluate teachers (6 or 12% at Level of State Control 4). 

The reader will note (also in Table 1) that in six states (Georgia, Hawaii, Louisiana, 
Pennsylvania, Texas, West Virginia) the state control is at the highest level (4) on our rating scale, 
with schools required to use state department evaluation instruments and/or to follow certain 
identified procedures. Five of these six states are located in the South or East, which is consistent 
with a finding in Furtwengler's 1995 analysis, while the other (Hawaii) is located outside of the 
continental United States. We further noted that those states that were more active, i.e., with early 
attention to evaluation statute or department of education procedures, tended to be centralized in 
policy making, and to have higher levels of control (i.e., 2, 3, or 4) over local evaluation policy and 
practices. On the other hand, those states that were more decentralized in educational policy making 
and permitted collective bargaining tended to be Level 1 (or sometimes 2, on our rating scale), thus 
leaving many of the details of teacher evaluation to local discretion with some remote control. 

We find that veteran teachers are evaluated less frequently. Table 1 also displays these data. 

It is interesting that while 21 states require annual evaluations for tenured teachers, 19 states have 
adopted extended timelines, i.e., with 1 1 states moving to once in 3 years, three to once in two years, 
and five to once in five years. Eight states have undefined timelines and two have adopted non- 
specific language such as "periodically" or "regularly" (see Table 1). Further, states which have a 
mandate to evaluate teachers more frequently (i.e., either annually or once every 2 years) also tend to 
be those with collective bargaining. 
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Table 1 


State-by-state comparison of the centralisation, levels of control, collective bargaining status, and 
frequency of tenured teacher evaluation 


State 

Centralized/ 

Decentralized? 1 

Level of 
Control 

Collective 

Bargaining 

Tenured Teacher 2 
Evaluation Frequency 

Alabama (AL) 

C 

3 

No 

Undefined 

Alaska (AK) 

D 

3 

Yes 

Once per two years 

Arizona (AZ) 

C 

3 

No 

Annual 

Arkansas (AR) 

C 

1 

No 

Annual 

California (CA) 

C 

2 

Yes 

Once in five years 

Colorado (CO) 

D 

2 

No 

Annual 

Connecticut (CT) 

D 

3 

Yes 

Annual 

Delaware (DE) 

D 

3 

Yes 

Annual 

Florida (FL) 

C 

3 

Yes 

Annual 

Georgia (GA) 

C 

4 

No 

Annual 

Hawaii (HI) 

C 

4 

Yes 

Once per five years 

Idaho (ID) 

D 

3 

Yes 

Annual 

Illinois (IL) 

D 

2 

Yes 

Once per two years 

Indiana (IN) 

C 

3 

Yes 

Once per three years 

Iowa (IA) 

D 

2 

Yes 

Once per three years 

Kansas (KS) 

D 

3 

Yes 

Once per three years 

Kentucky (KY) 

C 

3 

No 

Once per three years 

Louisiana (LA) 

C 

4 

No 

Once per three years 

Maine (ME) 

D 

1 

Yes 

Once per three years 

Maryland (MD) 

D 

1 

Yes 

Once per five years 

Massachusetts (MA) 

D 

3 

Yes 

Once per two years 

Michigan (MI) 

D 

1 

Yes 

Undefined 

Minnesota (MN) 

D 

1 

Yes 

Once per three years 

Mississippi (MS) 

C 

1 

No 

Annual 3 

Missouri (MO) 

D 

1 

No 

Once per five years 

Montana (MT) 

D 

1 

Yes 

Undefined 

Nebraska (NE) 

D 

1 

Yes 

Annual 

Nevada (NV) 

C 

1 

Yes 

Annual 

New Hampshire (NH) 

D 

1 

Yes 

Undefined 

New Jersey (NJ) 

D 

3 

Yes 

Annual 

New Mexico (NM) 

C 

2 

No 

Once per three years 

New York (NY) 

D 

3 

Yes 

Annual 

North Carolina (NC) 

C 

2 

No 

Once per three years 

North Dakota (ND) 

D 

1 

Yes 

Annual 

Ohio (OH) 

D 

3 

Yes 

Annual 

Oklahoma (OK) 

C 

1 

Yes 

Annual 

Oregon (OR) 

D 

2 

Yes 

Undefined 

Pennsylvania (PA) 

D 

4 

Yes 

Annual 

Rhode Island (RI) 

D 

1 

Yes 

Undefined 

South Carolina (SC) 

C 

3 

No 

Once per three years 

South Dakota (SD) 

D 

1 

Yes 

Annual 

Tennessee (TN) 

C 

2 

Yes 

Variable 
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State 

Centralized/ 

Decentralized? 1 

Level of 
Control 

Collective 

Bargaining 

Tenured Teacher 2 
Evaluation Frequency 

Texas (TX) 

C 

4 

No 

Once in five years 

Utah (UT) 

C 

1 

No 

Undefined 

Vermont (VT) 

D 

1 

Yes 

Undefined 

Virginia (VA) 

C 

1 

No 

Undefined 

Washington (WA) 

D 

2 

Yes 

Twice per year 

West Virginia (WV) 

C 

4 

No 

Once per three years 

Wisconsin (WI) 

D 

1 

Yes 

Undefined 

Wyoming (WY) 

D 

2 

No 

Undefined 

Totals 

C = 21; 
D = 29 

Level 1:19 
Level 2: 10 
Level 3:15 
Level 4: 6 

Yes: 33; 
No: 27 

Annual: 21 
1 in 3 yrs: 1 1 
1 in 2 yrs: 3 
1 in 5 yrs: 5 
Undefined/Non- 
specific: 10 


1 Based on Pipho (1991) classification of state curriculum decision making. 

2 Based on a satisfactory performance rating. 3 For low-performing schools. 


There is a definite trend across the states to adopt the strategies recommended by the 
National Governors Association. For example, all but nine states have adopted at least one of the 
NGA strategies (see Table 2). Training evaluators was one of the most frequently adopted strategies, 
with Texas requiring 36 hours in instructional leadership and 20 hours in evaluation instrument 
training. Alabama offers a one-week training with performance demonstration before administrators 
are certified to evaluate teachers. 

Defining teacher quality is also adopted most frequently. Most states have taken the 
approach of listing indicators of effective teaching, identifying standards, attributes, or performance 
dimensions. Kansas and New Jersey have identified the greatest number of items to define teaching 
(Kansas at 93 and New Jersey at 91). Broadening participation in evaluation is the next most 
frequently adopted NGA strategy (in 16 states). States have encouraged the representation of 
parents (in Florida and Utah), citizens and students (in Colorado, Kentucky, Louisiana and New 
York), and teacher associations (in 10 states) on the committees designing teacher evaluation 
systems. 

The three NGA strategies adopted less frequently include using peer review and/ or 
portfolios (12 states), increasing professional accountability through use of career ladders (10 states), 
and incorporating student achievement data into teacher evaluation ratings (12 states). We view the 
use of student performance data as a noteworthy addition to teacher evaluation. Ten states 
(California, Florida, Georgia, Tennessee, Colorado, New Mexico, Virginia, Indiana, Kansas, Maine) 
note that these data should be used in evaluation in some non-specified way, while two states 
(Delaware and Texas) calculate the proportion by fraction or percent of the rating scale to be based 
on student achievement. 
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Table 2 


States adopting NGA l strategies 


NGA Strategies 

States Adopting Strategy 

Total States 


AK, CA, CT, DE, FL, GA, IA, KS, 


Define teacher quality 

ICY, LA, MA, MD, NJ, OH, PA, SC, 
TN, TX, WV, WA 

20 

Focus on improving teaching 
practice through peer review 
and portfolios 

CO, CT, FL, LA, MN, NY, SC, TN, 
UT (team) WV (improvement team) 
WV (optional) 

MD, NB, (optional) 

12 

Incorporate student learning 

CA, CO, DE, FL, GA, IN, KS, ME, 
NM, TN, TX, VA 

12 

Create professional 
accountability through career 
ladders 

AZ, CA, CT, IA, NE, NJ, NY, TN, 
SD, UT 

10 

Train evaluators in preservice 

(a) in general 

(b) specific state 
requirements 

(c) inservice training 

CT, MO, WV 

AL, FL, GA, LA, OK, SC 

AK, AR, AZ, CO, DE, IA, KY, ME, 
NE, TN, TX*, WV 

20 

Broaden participation to 
include teachers and 
administrators 

FL (parents), KY, LA, UT (parents) 
CO (citizens, students), CT, IA, IL, 
NJ (teacher association), NY 
(commission members) 

AK, KS, NV, NH, OR, SD 

16 

No Strategy adopted 

ID, MI, MT, NH, RI, VT, WI, WY 

8 


* Texas requires 36 hours of Instructional Leadership and 20 hours of instrument training. 


Our analysis of teacher evaluation statutes and regulations indicates that state departments of 
education are adopting a variety of oversight strategies. Table 3 presents an array of oversight from 
the least invasive (not specified) to the most (evaluating teachers and approving or developing guides 
for remediation). State strategies that seem to be predictable include: presenting a model of 
evaluation (1 state), requiring local districts to file their policy (2 states), mandating at least some 
oversight (1 state), getting reports on the results of evaluations (3 states), and monitoring LEA 
evaluation policy and practices (7 states). These represent minimal oversight, given the 
recommendations of the NGA. 
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Table 3 


Types of state involvement or control 


Type of Involvement or Control 

State 

Total States 

Not specified 

AZ, ID, MI, MT, NH, 
RI, SD, VT, WI 

9 

A mandate that an evaluation system 
exists 

AL, ME, MD, MI 

4 

The state presents a model 

AL, ME 

2 

The state requires that a system be on file 

KS, NV 

2 

The state approves the evaluation system 
or alternatives 

FL, KY, TN, TX, WV 
CO, CT, DE, OH, PA, 
AR, IN, NE, WY 

14 

The state may monitor 

AK, CT (5 yr reports), 
DE, IL, LA, MA, NM 

7 

The state handles appeals 

KY 

1 

State committee for oversight 

SC 

1 

On-site review of system 

CA, KY 

2 

State receives reports on teacher results of 
evaluations 

IL, LA, SC (available to 
colleges) 

3 

State requires increased frequency for low 
performing schools 

MS, NM, NC 

3 

State may evaluate teachers 

IL 

1 

State approves teacher remediation plans 

SC 

1 

State develops guidelines for employee 
improvement plans 

DE 

1 

Other: NCLB testing 

CA 

1 


The most frequently adopted form of state oversight is the approval of local evaluation 
policy (in 14 states). As shown in Table 3, other state strategies that seem to intervene considerably 
in local policy and practices include: on-site review of evaluation (California and Kentucky); 
increasing the frequency of evaluation in low performing schools (Mississippi, New Mexico, and 
North Carolina); evaluating teachers (Illinois); approving remediation plans (South Carolina); 
developing guidelines for improvement plans (Delaware); and handling appeals of evaluation 
(Kentucky). 

Finally, changes in teacher evaluation statutes and regulations are increasingly focused on 
data. Peterson's (2004) review of research on teacher evaluation was instructive in our portrayal of 
changes that seem to be occurring. Our categories for grouping these data are adding new data, 
collecting the data, using the data, and conducting evaluations. As shown in Table 4, states place 
greater emphasis on adding new data, collecting data, and using the data than on conducting 
evaluations. Thus, while veteran teachers are often evaluated less frequently, we expect stakes in 
evaluation to be higher, especially when student achievement data are being used. For example, in 
Delaware and Texas student achievement data are used in calculating teachers' evaluation ratings. 
Obviously, the stakes would be even higher were the ratings to be connected to salary or merit 
increases, an action currently in use or under debate. 
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Table 4 


State developments in teacher evaluation since NCLB 


Types of Developments* 

States 

None specified 

AK, AR, AZ, ID, ME, MI, MD, MI, MT, NE, 
NH, RI, VT, WI, WY 

Adding data 


• National Board Certification 

SC 

• Customer service data 

UT 

• Student progress, gains 

CA, CO, DE, FL, GA, IN, ICS, NM, MD, TN, 
TX, VA 

• Graduation rates 

IN 

• Evaluation by teacher, pupils 

NV 

Collecting data 


• Classroom walk-through 

TX 

• Multiple methods in the evaluation 

CO, CT, DE, UT 

process 

• Portfolios and video portfolios 

CT, IA, MO, MD, ND, WV (optional) 

Using data 


• Student improvement as a fraction of 

DE, TX 

teacher’s evaluation score 

• Salary tied to satisfactory performance 

OH 

• Using evaluations when teachers 

LA, TX, WV, NC 

become employed by other districts 

• Aligned to school improvement 

KY, CO, CT, DE, NJ 

• Rubric 

PA 

• Narrative 

NJ 

Conducting evaluations 


• The pre-observation conference 

TN, TX, OR 

• Use of goal-setting 

OH 

• Expert external review when disputes 
in evaluation 

UT 

• A register of state forms 

LA 

• Use of teams 

SC, TN, UT, WV (optional) 

• Use of peers 

FL, LA, SC, TN, NY, NC, OK 

• Limits on lesson plan requirements 

WV 

• Professional plan in place 

MA 

• Limit “drive-by evaluation” 

NV 

• Post-evaluation interview 

OR 

• Choice of foundational, collegial team, 
or causal analysis process 

SD 


These types are in part based on Peterson’s (2004) categories. 


We also identify a related finding regarding data — that is, while teachers are the primary 
focus of this round of policy activity, administrators are not immune. For example, the evaluation of 
administrators in five states (Delaware, Florida, Georgia, Tennessee and Washington) now includes 
data about teacher evaluation and student gains in their schools. In addition, in two states (Illinois, 
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Texas), the evaluation of administrators now includes incentive pay and the use of independent 
evaluators from outside a district. 

The Practice of Teacher Evaluation 

First and foremost, teacher evaluation is likely to be complicated by the changes that states 
have made. Which and how many of the NGA strategies adopted by a state are likely to determine 
whether problems are immediate or delayed. For example, if a state only adopts the NGA strategy of 
training, problems in the state’s schools may be delayed. Flowever, if a state adds student progress 
data to evaluation and subsequently ties it to salary increases, we anticipate more problems and that 
those problems will surface sooner rather than later. Furthermore, the complications will be 
mitigated or heightened by teacher-administrator relations in individual locales. 

Florida is a case in point regarding both student progress and performance-based pay, two 
high profile NGA strategies. Florida has a history of dabbling in pay for performance plans. In the 
1980s it used the controversial Florida Performance Measurement System that formulaically 
identified outstanding teachers and took principal judgment out of the equation until a court case 
dismantled its use (Hazi, 1989). In 2006, it unsuccessfully tried to link annual bonuses with the 
academic progress of students in what was called E-Comp or “Effectiveness Compensation” 

(Florida Department of Education, 2006; Pinzur, 2006). 

The Merit Award Program is its most recent effort to award teacher bonuses based on 
student performance on the Florida Comprehensive Assessment Test (FCAT) or other district 
designed tests for those teachers of subjects not covered by the FCAT (Simmonson, 2007). Only 7 
of Florida’s 67 school districts participated in the unpopular teacher performance pay plan. The 
Merit Award Program was the state’s fourth plan in six years (Merit pay plan’s unintended lesson, 
2008). We believe it is only a matter of time, in states such as Florida, before performance pay plans 
will be challenged in courts where judges might next legally define what constitutes student progress. 

Our second concern is that certain practices added to procedure in these states may further 
complicate evaluation and make it more ritualistic. Those practices added, as shown in Table 4, 
include a classroom walkthrough, multiple measures, customer service data, use of student 
achievement data, peer review, portfolios, goal setting, and reflection. While these terms sit largely 
undefined and ambiguous in state statute and regulation, how they are ultimately defined by those 
who train and conduct evaluations will largely determine if they are used to help or to control 
teachers. While each practice is well-intentioned, when introduced into the arena of teacher 
evaluation as a mandated practice, it can be misused. 

For example, consider the supervisory practices of pre-conference and goal-setting. While 
not unique to this decade of reform, they are examined here for a number of reasons. First, the pre- 
conference was one of the earliest supervisory practices, from clinical supervision, that found its way 
into evaluation statute. Second, it became a practice endorsed by teachers and teacher associations as 
a protection against unannounced observations. Similarly, goal-setting gave teachers a way to 
participate more fully in evaluation. Both found their way into practice in the 1980s and tended to be 
popular among teachers. 

The finding that states are specifically defining effective teaching in performance objectives 
is likely to lead to restricted definitions of teaching and learning (Lewis, 2007). Such a view of what 
teaching and learning entails focuses evaluation processes on the state's specific definition of quality 
teaching. In addition to being inconsistent with current research on student learning, these restricted 
definitions of teaching lead to increased checklists, walkthroughs, and increased specificity of 
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procedures and instruments. For example, in one school in Georgia, regular walkthroughs are used 
to target specific teaching of an identified standard for a specified time period. 


Implications for Instructional Supervision 

To the extent that supervision and teacher evaluation are viewed as synonymous concepts, 
the implications described for evaluation will be the same for supervision. In several states, 
"recommended practices" believed by some educators to "show promise" in supervision are 
included in state statutes and department of education regulations, and have become 
institutionalized. For example, practices such as alignment of teacher evaluation to school 
improvement (in five states — Colorado, Connecticut, Delaware, Kentucky, and New Jersey), peer 
review (in Florida, Louisiana, and Oklahoma), use of mentors, (in South Carolina), use of self- 
evaluation and reflection (in Louisiana and Texas), peer cognitive coaching and action research (in 
Tennessee), if reflecting the leading edge of a trend, are troubling. If states are, indeed, embedding 
recommended professional development practices into the statute and regulation of teacher 
evaluation, what appears on the surface to be an effort toward building teacher capacity, may simply 
portend prescribed designs for required teacher learning activities and are inconsistent with adult 
learning principles. We wonder to what extent these mandates will further reinforce the view of 
teacher evaluation as "ritual" (Hazi & Arredondo Rucinski, 2006). 

Our discovery of a large number of states focused on data and the subsequent development 
of data cottage industries have potential implications for supervision. D3M or “data-driven decision- 
making” the new “buzz word for this century” is fueled by NCLB and its need for data about 
students (Mercurius, 2005). D3M requires “data repositories” to house, maintain and analyze 
information to improve teaching and learning that once used to be done manually or with limited 
software such as Excel. One example of these industries is data warehousing, the storage of 
demographic and test data in one electronic location. Data warehousing allows schools to store all 
data and then “mine” it, i.e., call forth and reconfigure specific information with minimal effort 
(Cohen, 2003). More importantly it promotes beliefs such as the following about the power of data: 
Also, the more data we review, the more confidently we can draw our inferences. 

For instance, if we see that a particular teacher has average students for three 
consecutive years who perform below their classmates, we can conclude that the 
teacher's effectiveness is below average, allowing supervisors to offer assistance 
where it is most needed. Including large numbers of students in the comparison 
makes our conclusions even more likely to be accurate. (Cohen, 2003, np) 

Other examples include data destruction companies and virtual supervision programs with IP- 
based videoconferencing equipment delivering video clips of teaching to data files of teaching to 
other locations (Amodeo & Taylor, 2004). 

We are concerned that such technology promotes surveillance, restricts access to data 
gathered, and perpetuates the illusion of objectivity. First, with student achievement data in the 
spotlight, some believe that principals will observe less. Indeed, our data show that state policies 
require that veteran teachers be evaluated less frequently. We wonder whether the achievement test 
is the new venue for teacher surveillance, replacing the once popular intercom as a listening device. 
We note that this surveillance of teachers seems consistent with the national trend where there is 
increased surveillance in the workplace to deter sexual harassment, accidents, theft, violence, 
sabotage, and goofing off (Kitchen, 2006) and in electronic search engines such as Google to deter 
on-line pornography (Hafner, 2006). Such acts appear innocuous and to benefit the greater good 
because they occur under the guise of increased productivity or safety. Access to achievement test 
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data and its accompanying jargon — subtests, disaggregated scores, subgroups, test bank items, 
benchmarking, formative assessment — is now not only limited to those with the most testing 
knowledge, but also to those with special passwords. We worry that teachers may not be among 
those with equal access. 

Finally, what is most disturbing is the false confidence that accompanies numbers, as if they 
can replace professional judgments about teaching. Such confidence occurred with the Florida 
Performance Measurement System (Hazi, 1989), that was believed to be objective, precise, and 
research-based, and that resulted in a score calculated by a computer, replacing observer judgment as 
well as feedback. Now, instead of an observation instrument, data warehouses will supply judgments 
about teaching, as shown in the earlier Cohen (2003) quote, to say with certitude what is happening 
inside of a classroom, without even stepping inside one. 

We may be in a period of what we call distressing practices. When teachers and their 
professional associations are involved in evaluation policy making, the intent of their involvement 
has often been protection against distressing practice. One example is the pre-conference. 
Originating with clinical supervision, a rationale for practice born out of work with student teachers 
in Harvard’s MAT program, the pre-conference was designed to help both supervisor and teacher 
plan the lesson and prepare for the observation (Cogan, 1973). As its practice was adopted by 
schools, and sometimes legislated (in the case of California), it became a way for teachers to protect 
themselves from poor evaluation (Black, 1993). When it became institutionalized in statute and 
policy and applied to teacher evaluation, the pre-conference became a way for teachers to learn to 
do a “tap dance” more in tune with the administrator’s expectations (Garman & Hazi, 1988). While 
the pre-conference protected the teacher from poor evaluation, we now believe that peer, team, and 
mentor involvements may attempt to provide that same protection, and thus, become further 
complications to the practice of evaluation. It seems likely to us that supervision, forever entangled 
with evaluation, will most likely continue to be viewed cynically by both teachers and principals. 

Conclusion 

This research examined teacher evaluation statutes and department of education regulations 
promulgated since NCLB in the 50 states. Data were collected from state websites and analyzed 
deductively in terms of the strategies identified by the National Governors Association for 
underlying theories of action, trends, likely effects on teacher evaluation and implications for 
supervision. Identified trends show that the majority of states adopted NGA strategies, asserted 
more oversight and involvement in local evaluation practices, decreased the frequency of veteran 
teacher evaluation, and increased the data used in evaluation. While the effects of these policy 
actions on student learning remain unclear at this point, it is evident that states have moved forward 
in their adoption of the NGA strategies. 

The continued and long standing tensions between the helping and evaluative functions of 
supervision, the “renaming” of supervision as instructional leadership, and resulting interpretations 
of instructional leadership as teacher evaluation, support our earlier predictions that the dual 
functions, although intertwined throughout supervision history, are likely to remain incompatible 
(Hazi & Arredondo Rucinski, 2006). It seems unlikely to us that state department involvement 
viewed as increasingly invasive and controlling will lead to the development of ideal learning 
conditions aimed at improving teacher capacity. Hence, the implicit expectation from state 
departments and state legislatures that the policy actions described in this study will lead to 
improved student learning seems problematic. And, whether or not the changes will "transform and 
revolutionize" teacher evaluation in the long run remains to be seen. 
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Research such as this contributes to the education policy literature regarding the standards 
and accountability initiatives aimed at improving student learning. It catalogues the efforts of the 50 
states to address the NGA strategies for school reform. If, as Porter and Chester (2004, p. 2) have 
argued, "a carefully crafted and continuously refined assessment and accountability program can lead 
to more effective schools and higher levels of student persistence and achievement," then examining 
changes in statutes and policy on teacher evaluation may shed light on the assumptions underlying 
such policies, and illustrate that "theories of action" connecting increased controls of teacher 
performance may rest on tenuous and uncertain linkages. 
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